Automatic Word Sense Disambiguation Using Cooccurrence and Hierarchical Information

نویسندگان

David Fernández-Amorós

Ruben Heradio

José Antonio Cerrada

Carlos Cerrada Somolinos

چکیده

We review in detail here a polished version of the systems with which we participated in the Senseval2 competition English tasks (all words and lexical sample). It is based on a combination of selectional preference measured over a large corpus and hierarchical information taken from WordNet, as well as some additional heuristics. We use that information to expand sense glosses of the senses in WordNet and compare the similarity between the contexts vectors and the word sense vectors in a way similar to that used by Yarowsky and Schuetze. A supervised extension of the system is also discussed. We provide new and previously unpublished evaluation over the SemCor collection, which is two orders of magnitude larger than SENSEVAL-2 collections as well as comparison with baselines. Our systems scored first among unsupervised systems in both tasks. We note that the method is very sensitive to the quality of the characterizations of word senses; glosses being much better than training examples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subject-Dependent Co-Occurence and Word Sense Disambiguation

We describe a method for obtaining subject-dependent word sets relative to some (subjecO domain. Using the subject classifications given in the machine-readable version of Longman's Dictionary of Contemporary English, we established subject-dependent cooccurrence links between words of the defining vocabulary to construct these "neighborhoods". Here, we describe the application of these neighbo...

متن کامل

UOY: A Hypergraph Model For Word Sense Induction & Disambiguation

This paper is an outcome of ongoing research and presents an unsupervised method for automatic word sense induction (WSI) and disambiguation (WSD). The induction algorithm is based on modeling the cooccurrences of two or more words using hypergraphs. WSI takes place by detecting high-density components in the cooccurrence hypergraphs. WSD assigns to each induced cluster a score equal to the sum...

متن کامل

Graph-based Word Clustering using a Web Search Engine

Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by web counts. Each pair of words is queried to a search engine, which produces a co-occurrence matrix. By cal...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Determining Word Sense Dominance Using a Thesaurus

The degree of dominance of a sense of a word is the proportion of occurrences of that sense in text. We propose four new methods to accurately determine word sense dominance using raw text and a published thesaurus. Unlike the McCarthy et al. (2004) system, these methods can be used on relatively small target texts, without the need for a similarly-sensedistributed auxiliary text. We perform an...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Automatic Word Sense Disambiguation Using Cooccurrence and Hierarchical Information

نویسندگان

چکیده

منابع مشابه

Subject-Dependent Co-Occurence and Word Sense Disambiguation

UOY: A Hypergraph Model For Word Sense Induction & Disambiguation

Graph-based Word Clustering using a Web Search Engine

Automatic Construction of Persian ICT WordNet using Princeton WordNet

Determining Word Sense Dominance Using a Thesaurus

عنوان ژورنال:

اشتراک گذاری